{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas基础\n",
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"http://nbviewer.ipython.org/github/ShowMeAI-Hub/awesome-AI-cheatsheets/blob/main/Pandas/pandas基础-cheatsheet-code.ipynb\"><img src=\"https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg\" />在nbviewer上查看notebook</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/ShowMeAI-Hub/awesome-AI-cheatsheets/blob/main/Pandas/pandas基础-cheatsheet-code.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" />在Google Colab运行</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://github.com/ShowMeAI-Hub/awesome-AI-cheatsheets/tree/main/Pandas/pandas基础-cheatsheet-code.ipynb\"><img src=\"https://badgen.net/badge/open/github/color=cyan?icon=github\" />在Github上查看源代码</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://github.com/ShowMeAI-Hub/awesome-AI-cheatsheets/Pandas/pandas基础-速查表.pdf\"><img src=\"https://badgen.net/badge/download/pdf/color=white?icon=github\"/>下载速查表</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 说明\n",
    "**notebook by [韩信子](https://github.com/HanXinzi-AI)@[ShowMeAI](https://github.com/ShowMeAI-Hub)**\n",
    "\n",
    "更多AI速查表资料请查看[速查表大全](https://github.com/ShowMeAI-Hub/awesome-AI-cheatsheets)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas是基于Numpy创建的Python库，为Python提供了易于使用的数据结构和数据分析工具。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用以下语句导入Pandas库："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas数据结构"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series - 序列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "存储任意类型数据的一维数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrame - 数据帧"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {'Country': ['Belgium', 'India', 'Brazil'],\n",
    "        'Capital': ['Brussels', 'New Delhi', 'Brasília'],\n",
    "        'Population': [11190846, 1303171035, 207847528]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "1    India  New Delhi  1303171035\n",
       "2   Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 调用帮助"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on property:\n",
      "\n",
      "    Access a group of rows and columns by label(s) or a boolean array.\n",
      "    \n",
      "    ``.loc[]`` is primarily label based, but may also be used with a\n",
      "    boolean array.\n",
      "    \n",
      "    Allowed inputs are:\n",
      "    \n",
      "    - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is\n",
      "      interpreted as a *label* of the index, and **never** as an\n",
      "      integer position along the index).\n",
      "    - A list or array of labels, e.g. ``['a', 'b', 'c']``.\n",
      "    - A slice object with labels, e.g. ``'a':'f'``.\n",
      "    \n",
      "      .. warning:: Note that contrary to usual python slices, **both** the\n",
      "          start and the stop are included\n",
      "    \n",
      "    - A boolean array of the same length as the axis being sliced,\n",
      "      e.g. ``[True, False, True]``.\n",
      "    - An alignable boolean Series. The index of the key will be aligned before\n",
      "      masking.\n",
      "    - An alignable Index. The Index of the returned selection will be the input.\n",
      "    - A ``callable`` function with one argument (the calling Series or\n",
      "      DataFrame) and that returns valid output for indexing (one of the above)\n",
      "    \n",
      "    See more at :ref:`Selection by Label <indexing.label>`.\n",
      "    \n",
      "    Raises\n",
      "    ------\n",
      "    KeyError\n",
      "        If any items are not found.\n",
      "    IndexingError\n",
      "        If an indexed key is passed and its index is unalignable to the frame index.\n",
      "    \n",
      "    See Also\n",
      "    --------\n",
      "    DataFrame.at : Access a single value for a row/column label pair.\n",
      "    DataFrame.iloc : Access group of rows and columns by integer position(s).\n",
      "    DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the\n",
      "        Series/DataFrame.\n",
      "    Series.loc : Access group of values using labels.\n",
      "    \n",
      "    Examples\n",
      "    --------\n",
      "    **Getting values**\n",
      "    \n",
      "    >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],\n",
      "    ...      index=['cobra', 'viper', 'sidewinder'],\n",
      "    ...      columns=['max_speed', 'shield'])\n",
      "    >>> df\n",
      "                max_speed  shield\n",
      "    cobra               1       2\n",
      "    viper               4       5\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    Single label. Note this returns the row as a Series.\n",
      "    \n",
      "    >>> df.loc['viper']\n",
      "    max_speed    4\n",
      "    shield       5\n",
      "    Name: viper, dtype: int64\n",
      "    \n",
      "    List of labels. Note using ``[[]]`` returns a DataFrame.\n",
      "    \n",
      "    >>> df.loc[['viper', 'sidewinder']]\n",
      "                max_speed  shield\n",
      "    viper               4       5\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    Single label for row and column\n",
      "    \n",
      "    >>> df.loc['cobra', 'shield']\n",
      "    2\n",
      "    \n",
      "    Slice with labels for row and single label for column. As mentioned\n",
      "    above, note that both the start and stop of the slice are included.\n",
      "    \n",
      "    >>> df.loc['cobra':'viper', 'max_speed']\n",
      "    cobra    1\n",
      "    viper    4\n",
      "    Name: max_speed, dtype: int64\n",
      "    \n",
      "    Boolean list with the same length as the row axis\n",
      "    \n",
      "    >>> df.loc[[False, False, True]]\n",
      "                max_speed  shield\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    Alignable boolean Series:\n",
      "    \n",
      "    >>> df.loc[pd.Series([False, True, False],\n",
      "    ...        index=['viper', 'sidewinder', 'cobra'])]\n",
      "                max_speed  shield\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    Index (same behavior as ``df.reindex``)\n",
      "    \n",
      "    >>> df.loc[pd.Index([\"cobra\", \"viper\"], name=\"foo\")]\n",
      "           max_speed  shield\n",
      "    foo\n",
      "    cobra          1       2\n",
      "    viper          4       5\n",
      "    \n",
      "    Conditional that returns a boolean Series\n",
      "    \n",
      "    >>> df.loc[df['shield'] > 6]\n",
      "                max_speed  shield\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    Conditional that returns a boolean Series with column labels specified\n",
      "    \n",
      "    >>> df.loc[df['shield'] > 6, ['max_speed']]\n",
      "                max_speed\n",
      "    sidewinder          7\n",
      "    \n",
      "    Callable that returns a boolean Series\n",
      "    \n",
      "    >>> df.loc[lambda df: df['shield'] == 8]\n",
      "                max_speed  shield\n",
      "    sidewinder          7       8\n",
      "    \n",
      "    **Setting values**\n",
      "    \n",
      "    Set value for all items matching the list of labels\n",
      "    \n",
      "    >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50\n",
      "    >>> df\n",
      "                max_speed  shield\n",
      "    cobra               1       2\n",
      "    viper               4      50\n",
      "    sidewinder          7      50\n",
      "    \n",
      "    Set value for an entire row\n",
      "    \n",
      "    >>> df.loc['cobra'] = 10\n",
      "    >>> df\n",
      "                max_speed  shield\n",
      "    cobra              10      10\n",
      "    viper               4      50\n",
      "    sidewinder          7      50\n",
      "    \n",
      "    Set value for an entire column\n",
      "    \n",
      "    >>> df.loc[:, 'max_speed'] = 30\n",
      "    >>> df\n",
      "                max_speed  shield\n",
      "    cobra              30      10\n",
      "    viper              30      50\n",
      "    sidewinder         30      50\n",
      "    \n",
      "    Set value for rows matching callable condition\n",
      "    \n",
      "    >>> df.loc[df['shield'] > 35] = 0\n",
      "    >>> df\n",
      "                max_speed  shield\n",
      "    cobra              30      10\n",
      "    viper               0       0\n",
      "    sidewinder          0       0\n",
      "    \n",
      "    **Getting values on a DataFrame with an index that has integer labels**\n",
      "    \n",
      "    Another example using integers for the index\n",
      "    \n",
      "    >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],\n",
      "    ...      index=[7, 8, 9], columns=['max_speed', 'shield'])\n",
      "    >>> df\n",
      "       max_speed  shield\n",
      "    7          1       2\n",
      "    8          4       5\n",
      "    9          7       8\n",
      "    \n",
      "    Slice with integer labels for rows. As mentioned above, note that both\n",
      "    the start and stop of the slice are included.\n",
      "    \n",
      "    >>> df.loc[7:9]\n",
      "       max_speed  shield\n",
      "    7          1       2\n",
      "    8          4       5\n",
      "    9          7       8\n",
      "    \n",
      "    **Getting values with a MultiIndex**\n",
      "    \n",
      "    A number of examples using a DataFrame with a MultiIndex\n",
      "    \n",
      "    >>> tuples = [\n",
      "    ...    ('cobra', 'mark i'), ('cobra', 'mark ii'),\n",
      "    ...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),\n",
      "    ...    ('viper', 'mark ii'), ('viper', 'mark iii')\n",
      "    ... ]\n",
      "    >>> index = pd.MultiIndex.from_tuples(tuples)\n",
      "    >>> values = [[12, 2], [0, 4], [10, 20],\n",
      "    ...         [1, 4], [7, 1], [16, 36]]\n",
      "    >>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)\n",
      "    >>> df\n",
      "                         max_speed  shield\n",
      "    cobra      mark i           12       2\n",
      "               mark ii           0       4\n",
      "    sidewinder mark i           10      20\n",
      "               mark ii           1       4\n",
      "    viper      mark ii           7       1\n",
      "               mark iii         16      36\n",
      "    \n",
      "    Single label. Note this returns a DataFrame with a single index.\n",
      "    \n",
      "    >>> df.loc['cobra']\n",
      "             max_speed  shield\n",
      "    mark i          12       2\n",
      "    mark ii          0       4\n",
      "    \n",
      "    Single index tuple. Note this returns a Series.\n",
      "    \n",
      "    >>> df.loc[('cobra', 'mark ii')]\n",
      "    max_speed    0\n",
      "    shield       4\n",
      "    Name: (cobra, mark ii), dtype: int64\n",
      "    \n",
      "    Single label for row and column. Similar to passing in a tuple, this\n",
      "    returns a Series.\n",
      "    \n",
      "    >>> df.loc['cobra', 'mark i']\n",
      "    max_speed    12\n",
      "    shield        2\n",
      "    Name: (cobra, mark i), dtype: int64\n",
      "    \n",
      "    Single tuple. Note using ``[[]]`` returns a DataFrame.\n",
      "    \n",
      "    >>> df.loc[[('cobra', 'mark ii')]]\n",
      "                   max_speed  shield\n",
      "    cobra mark ii          0       4\n",
      "    \n",
      "    Single tuple for the index with a single label for the column\n",
      "    \n",
      "    >>> df.loc[('cobra', 'mark i'), 'shield']\n",
      "    2\n",
      "    \n",
      "    Slice from index tuple to single label\n",
      "    \n",
      "    >>> df.loc[('cobra', 'mark i'):'viper']\n",
      "                         max_speed  shield\n",
      "    cobra      mark i           12       2\n",
      "               mark ii           0       4\n",
      "    sidewinder mark i           10      20\n",
      "               mark ii           1       4\n",
      "    viper      mark ii           7       1\n",
      "               mark iii         16      36\n",
      "    \n",
      "    Slice from index tuple to index tuple\n",
      "    \n",
      "    >>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]\n",
      "                        max_speed  shield\n",
      "    cobra      mark i          12       2\n",
      "               mark ii          0       4\n",
      "    sidewinder mark i          10      20\n",
      "               mark ii          1       4\n",
      "    viper      mark ii          7       1\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pd.Series.loc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 输入/输出"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读取/写入CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('myDataFrame.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "1    India  New Delhi  1303171035\n",
       "2   Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv('myDataFrame.csv', nrows=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读取/写入Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_excel('myDataFrame.xlsx', index=False, sheet_name='Sheet1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "1    India  New Delhi  1303171035\n",
       "2   Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_excel('myDataFrame.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "xlsx = pd.ExcelFile('myDataFrame.xlsx')   #读取内含多个表的Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_excel(xlsx, 'Sheet1')   #读取多表Excel中的Sheet1表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "1    India  New Delhi  1303171035\n",
       "2   Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读取和写入 SQL 查询及数据库表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import create_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "engine = create_engine('sqlite:///:memory:')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.read_sql(\"SELECT * FROM my_table;\", engine)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.read_sql_table('my_table', engine)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.read_sql_query(\"SELECT * FROM my_table;\", engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**read_sql()是 read_sql_table() 与 read_sql_query()的便捷打包器**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.to_sql('myDf', engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 筛选数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 取值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-5"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s['b']   #取序列的值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Country    Capital  Population\n",
       "1   India  New Delhi  1303171035\n",
       "2  Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[1:]   #取数据帧的子集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 选取、布尔索引及设置值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**按位置**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country\n",
       "0  Belgium"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[[0], [0]]   #按行与列的位置选择某值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Belgium'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iat[0, 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**按标签**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country\n",
       "0  Belgium"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[0], ['Country']]   #按行与列的名称选择某值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Belgium'"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.at[0, 'Country']   #按行与列的名称选择某值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**按标签/位置**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country          Brazil\n",
       "Capital        Brasília\n",
       "Population    207847528\n",
       "Name: 2, dtype: object"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[2]   #选择某行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     Brussels\n",
       "1    New Delhi\n",
       "2     Brasília\n",
       "Name: Capital, dtype: object"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:, 'Capital']   #选择某列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'New Delhi'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[1, 'Capital'] #按行列取值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**布尔索引**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b   -5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[~(s > 1)]   #序列S中没有大于1的值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    3\n",
       "b   -5\n",
       "c    7\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[(s < -1) | (s > 2)]   #序列S中小于-1或大于2的值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Country    Capital  Population\n",
       "1   India  New Delhi  1303171035"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Population']>1200000000]   #选择数据帧中Population大于12亿的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Country    Capital\n",
       "1   India  New Delhi"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df['Population']>1200000000, ['Country', 'Capital']]   #选择数据帧中人口大于12亿的数据'Country'和'Capital'字段"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**设置值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "s['a'] = 6   #将序列s中索引为a的值设为6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 删除数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**通过drop函数删除数据**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b   -5\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.drop(['a', 'c'])   #按索引删除序列的值(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Capital  Population\n",
       "0   Brussels    11190846\n",
       "1  New Delhi  1303171035\n",
       "2   Brasília   207847528"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop('Country', axis=1)   #按列名删除数据帧的列(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 排序和排名"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**根据索引或者值进行排序**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "1    India  New Delhi  1303171035\n",
       "2   Brazil   Brasília   207847528"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index()   #按索引排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>Brasília</td>\n",
       "      <td>207847528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>New Delhi</td>\n",
       "      <td>1303171035</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country    Capital  Population\n",
       "0  Belgium   Brussels    11190846\n",
       "2   Brazil   Brasília   207847528\n",
       "1    India  New Delhi  1303171035"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by='Country')   #按某列的值排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Country  Capital  Population\n",
       "0      1.0      2.0         1.0\n",
       "1      3.0      3.0         3.0\n",
       "2      2.0      1.0         2.0"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rank()   #数据帧排名"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 查询信息与计算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 基本信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 3)"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape   #(行,列)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=3, step=1)"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index   #获取索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Country', 'Capital', 'Population'], dtype='object')"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns   #获取列名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column      Non-Null Count  Dtype \n",
      "---  ------      --------------  ----- \n",
      " 0   Country     3 non-null      object\n",
      " 1   Capital     3 non-null      object\n",
      " 2   Population  3 non-null      int64 \n",
      "dtypes: int64(1), object(2)\n",
      "memory usage: 200.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()   #获取数据帧基本信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country       3\n",
       "Capital       3\n",
       "Population    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.count()   #非Na值的数量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 汇总"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country              BelgiumIndiaBrazil\n",
       "Capital       BrusselsNew DelhiBrasília\n",
       "Population                   1522209409\n",
       "dtype: object"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sum()   #合计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>Brussels</td>\n",
       "      <td>11190846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BelgiumIndia</td>\n",
       "      <td>BrusselsNew Delhi</td>\n",
       "      <td>1314361881</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BelgiumIndiaBrazil</td>\n",
       "      <td>BrusselsNew DelhiBrasília</td>\n",
       "      <td>1522209409</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Country                    Capital  Population\n",
       "0             Belgium                   Brussels    11190846\n",
       "1        BelgiumIndia          BrusselsNew Delhi  1314361881\n",
       "2  BelgiumIndiaBrazil  BrusselsNew DelhiBrasília  1522209409"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.cumsum()   #累计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.008587396204673933"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Population'].min()/df['Population'].max()   #最小值除以最大值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Population'].idxmin()/df['Population'].idxmax()   #索引最小值除以索引最大值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>3.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>5.074031e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>6.961346e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.119085e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.095192e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2.078475e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>7.555093e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1.303171e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Population\n",
       "count  3.000000e+00\n",
       "mean   5.074031e+08\n",
       "std    6.961346e+08\n",
       "min    1.119085e+07\n",
       "25%    1.095192e+08\n",
       "50%    2.078475e+08\n",
       "75%    7.555093e+08\n",
       "max    1.303171e+09"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()   #基础统计数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Population    5.074031e+08\n",
       "dtype: float64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mean()   #平均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Population    207847528.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.median()   #中位数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 应用函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**通过apply函数应用变换**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = lambda x: x*2   #应用匿名函数lambda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>BelgiumBelgium</td>\n",
       "      <td>BrusselsBrussels</td>\n",
       "      <td>22381692</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>IndiaIndia</td>\n",
       "      <td>New DelhiNew Delhi</td>\n",
       "      <td>2606342070</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrazilBrazil</td>\n",
       "      <td>BrasíliaBrasília</td>\n",
       "      <td>415695056</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Country             Capital  Population\n",
       "0  BelgiumBelgium    BrusselsBrussels    22381692\n",
       "1      IndiaIndia  New DelhiNew Delhi  2606342070\n",
       "2    BrazilBrazil    BrasíliaBrasília   415695056"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(f)   # 应用函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Capital</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>BelgiumBelgium</td>\n",
       "      <td>BrusselsBrussels</td>\n",
       "      <td>22381692</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>IndiaIndia</td>\n",
       "      <td>New DelhiNew Delhi</td>\n",
       "      <td>2606342070</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrazilBrazil</td>\n",
       "      <td>BrasíliaBrasília</td>\n",
       "      <td>415695056</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Country             Capital  Population\n",
       "0  BelgiumBelgium    BrusselsBrussels    22381692\n",
       "1      IndiaIndia  New DelhiNew Delhi  2606342070\n",
       "2    BrazilBrazil    BrasíliaBrasília   415695056"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.applymap(f)   #对每个单元格应用函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据对齐"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 内部数据对齐"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**如有不一致的索引，则使用NA值：**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    13.0\n",
       "b     NaN\n",
       "c     5.0\n",
       "d     7.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s + s3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用 Fill 方法运算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**还可以使用 Fill 方法****补齐缺失后再****运算：**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    13.0\n",
       "b    -5.0\n",
       "c     5.0\n",
       "d     7.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.add(s3, fill_value=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a   -1.0\n",
       "b   -7.0\n",
       "c    9.0\n",
       "d    1.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sub(s3, fill_value=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0.857143\n",
       "b   -1.250000\n",
       "c   -3.500000\n",
       "d    1.333333\n",
       "dtype: float64"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.div(s3, fill_value=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    42.0\n",
       "b   -15.0\n",
       "c   -14.0\n",
       "d    12.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.mul(s3, fill_value=3)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "307.797px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
